AITopics | classifier performance

Collaborating Authors

classifier performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Statistical Validation of Innovation Lens

Radaelli, Giacomo, Lynch, Jonah

arXiv.org Artificial IntelligenceAug-21-2025

Information overload and the rapid pace of scientific advancement make it increasingly difficult to evaluate and allocate resources to new research proposals. Is there a structure to scientific discovery that could inform such decisions? We present statistical evidence for such structure, by training a classifier that successfully predicts high-citation research papers between 2010-2024 in the Computer Science, Physics, and PubMed domains.

machine learning, natural language, prediction, (18 more...)

arXiv.org Artificial Intelligence

2508.14139

Country: North America > United States (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.70)

Add feedback

When Plants Respond: Electrophysiology and Machine Learning for Green Monitoring Systems

Buss, Eduard, Aust, Till, Hamann, Heiko

arXiv.org Artificial IntelligenceJul-1-2025

Living plants, while contributing to ecological balance and climate regulation, also function as natural sensors capable of transmitting information about their internal physiological states and surrounding conditions. This rich source of data provides potential for applications in environmental monitoring and precision agriculture. With integration into biohybrid systems, we establish novel channels of physiological signal flow between living plants and artificial devices. We equipped *Hedera helix* with a plant-wearable device called PhytoNode to continuously record the plant's electrophysiological activity. We deployed plants in an uncontrolled outdoor environment to map electrophysiological patterns to environmental conditions. Over five months, we collected data that we analyzed using state-of-the-art and automated machine learning (AutoML). Our classification models achieve high performance, reaching macro F1 scores of up to 95 percent in binary tasks. AutoML approaches outperformed manual tuning, and selecting subsets of statistical features further improved accuracy. Our biohybrid living system monitors the electrophysiology of plants in harsh, real-world conditions. This work advances scalable, self-sustaining, and plant-integrated living biohybrid systems for sustainable environmental monitoring.

artificial intelligence, classifier, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2506.23872

Country: Europe (0.14)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.61)
Health & Medicine > Diagnostic Medicine (0.61)
Food & Agriculture > Agriculture (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

MLMC: Interactive multi-label multi-classifier evaluation without confusion matrices

Doknic, Aleksandar, Möller, Torsten

arXiv.org Artificial IntelligenceJan-24-2025

Machine learning-based classifiers are commonly evaluated by metrics like accuracy, but deeper analysis is required to understand their strengths and weaknesses. MLMC is a visual exploration tool that tackles the challenge of multi-label classifier comparison and evaluation. It offers a scalable alternative to confusion matrices which are commonly used for such tasks, but don't scale well with a large number of classes or labels. Additionally, MLMC allows users to view classifier performance from an instance perspective, a label perspective, and a classifier perspective. Our user study shows that the techniques implemented by MLMC allow for a powerful multi-label classifier evaluation while preserving user friendliness.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.1446

Country:

Europe > Austria > Vienna (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.46)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

A Classification Benchmark for Artificial Intelligence Detection of Laryngeal Cancer from Patient Speech

Paterson, Mary, Moor, James, Cutillo, Luisa

arXiv.org Artificial IntelligenceDec-20-2024

Cases of laryngeal cancer are predicted to rise significantly in the coming years. Current diagnostic pathways cause many patients to be incorrectly referred to urgent suspected cancer pathways, putting undue stress on both patients and the medical system. Artificial intelligence offers a promising solution by enabling non-invasive detection of laryngeal cancer from patient speech, which could help prioritise referrals more effectively and reduce inappropriate referrals of non-cancer patients. To realise this potential, open science is crucial. A major barrier in this field is the lack of open-source datasets and reproducible benchmarks, forcing researchers to start from scratch. Our work addresses this challenge by introducing a benchmark suite comprising 36 models trained and evaluated on open-source datasets. These models are accessible in a public repository, providing a foundation for future research. They evaluate three different algorithms and three audio feature sets, offering a comprehensive benchmarking framework. We propose standardised metrics and evaluation methodologies to ensure consistent and comparable results across future studies. The presented models include both audio-only inputs and multimodal inputs that incorporate demographic and symptom data, enabling their application to datasets with diverse patient information. By providing these benchmarks, future researchers can evaluate their datasets, refine the models, and use them as a foundation for more advanced approaches. This work aims to provide a baseline for establishing reproducible benchmarks, enabling researchers to compare new methods against these standards and ultimately advancing the development of AI tools for detecting laryngeal cancer.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.16267

Country: Europe > Germany > Saarland (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Head & Neck Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

ABROCA Distributions For Algorithmic Bias Assessment: Considerations Around Interpretation

Borchers, Conrad, Baker, Ryan S.

arXiv.org Machine LearningNov-28-2024

Algorithmic bias is of critical concern within education as it could undermine the effectiveness of learning analytics. While different definitions and conceptualizations of algorithmic bias and fairness exist [2], their common denominator typically revolves around systematic unfairness or unequal treatment of groups caused by algorithms. This bias occurs when an algorithm produces results that disproportionately disadvantage or favor particular groups of people based on non-malleable characteristics like race, gender, or socioeconomic status [7]. Recent learning analytics research argued that although the vast majority of published papers investigating algorithmic bias in education find evidence of bias [2], some predictive models appear to achieve fairness, with minimal difference in model quality across demographic groups. For example, Zambrano et al. [18] evaluated careless detectors and Bayesian knowledge tracing models, finding near-equal performance across groups defined by race, gender, socioeconomic status, special needs, and English language learner status. Similarly, Jiang and Pardos [10] compared accuracies of grade prediction models across ethnic groups, concluding that an adversarial learning approach led to the fairest models but did not engage in the question of whether their fairest model was sufficiently fair.

abroca, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3706468.3706498

2411.1909

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Education (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Use Random Selection for Now: Investigation of Few-Shot Selection Strategies in LLM-based Text Augmentation for Classification

Cegin, Jan, Pecher, Branislav, Simko, Jakub, Srba, Ivan, Bielikova, Maria, Brusilovsky, Peter

arXiv.org Artificial IntelligenceOct-14-2024

The generative large language models (LLMs) are increasingly used for data augmentation tasks, where text samples are paraphrased (or generated anew) and then used for classifier fine-tuning. Existing works on augmentation leverage the few-shot scenarios, where samples are given to LLMs as part of prompts, leading to better augmentations. Yet, the samples are mostly selected randomly and a comprehensive overview of the effects of other (more ``informed'') sample selection strategies is lacking. In this work, we compare sample selection strategies existing in few-shot learning literature and investigate their effects in LLM-based textual augmentation. We evaluate this on in-distribution and out-of-distribution classifier performance. Results indicate, that while some ``informed'' selection strategies increase the performance of models, especially for out-of-distribution data, it happens only seldom and with marginal performance increases. Unless further advances are made, a default of random sample selection remains a good option for augmentation practitioners.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.10756

Country:

Asia > Singapore (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > Czechia > South Moravian Region > Brno (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.67)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Data Alchemy: Mitigating Cross-Site Model Variability Through Test Time Data Calibration

Parida, Abhijeet, Alomar, Antonia, Jiang, Zhifan, Roshanitabrizi, Pooneh, Tapp, Austin, Ledesma-Carbayo, Maria, Xu, Ziyue, Anwar, Syed Muhammed, Linguraru, Marius George, Roth, Holger R.

arXiv.org Artificial IntelligenceJul-18-2024

Deploying deep learning-based imaging tools across various clinical sites poses significant challenges due to inherent domain shifts and regulatory hurdles associated with site-specific fine-tuning. For histopathology, stain normalization techniques can mitigate discrepancies, but they often fall short of eliminating inter-site variations. Therefore, we present Data Alchemy, an explainable stain normalization method combined with test time data calibration via a template learning framework to overcome barriers in cross-site analysis. Data Alchemy handles shifts inherent to multi-site data and minimizes them without needing to change the weights of the normalization or classifier networks. Our approach extends to unseen sites in various clinical settings where data domain discrepancies are unknown. Extensive experiments highlight the efficacy of our framework in tumor classification in hematoxylin and eosin-stained patches. Our explainable normalization method boosts classification tasks' area under the precision-recall curve (AUPR) by 0.165, 0.545 to 0.710. Additionally, Data Alchemy further reduces the multisite classification domain gap, by improving the 0.710 AUPR an additional 0.142, elevating classification performance further to 0.852, from 0.545. Our Data Alchemy framework can popularize precision medicine with minimal operational overhead by allowing for the seamless integration of pre-trained deep learning-based clinical tools across multiple sites.

data alchemy, normalization, stain normalization, (10 more...)

arXiv.org Artificial Intelligence

2407.13632

Country:

North America > United States > District of Columbia > Washington (0.04)
Europe > Spain > Galicia > Madrid (0.04)
North America > United States > California > Santa Clara County > Santa Clara (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Government (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.70)
Health & Medicine > Health Care Providers & Services (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dataset Representativeness and Downstream Task Fairness

Borza, Victor, Estornell, Andrew, Ho, Chien-Ju, Malin, Bradley, Vorobeychik, Yevgeniy

arXiv.org Artificial IntelligenceJun-28-2024

Our society collects data on people for a wide range of applications, from building a census for policy evaluation to running meaningful clinical trials. To collect data, we typically sample individuals with the goal of accurately representing a population of interest. However, current sampling processes often collect data opportunistically from data sources, which can lead to datasets that are biased and not representative, i.e., the collected dataset does not accurately reflect the distribution of demographics of the true population. This is a concern because subgroups within the population can be under- or over-represented in a dataset, which may harm generalizability and lead to an unequal distribution of benefits and harms from downstream tasks that use such datasets (e.g., algorithmic bias in medical decision-making algorithms). In this paper, we assess the relationship between dataset representativeness and group-fairness of classifiers trained on that dataset. We demonstrate that there is a natural tension between dataset representativeness and classifier fairness; empirically we observe that training datasets with better representativeness can frequently result in classifiers with higher rates of unfairness. We provide some intuition as to why this occurs via a set of theoretical results in the case of univariate classifiers. We also find that over-sampling underrepresented groups can result in classifiers which exhibit greater bias to those groups. Lastly, we observe that fairness-aware sampling strategies (i.e., those which are specifically designed to select data with high downstream fairness) will often over-sample members of majority groups. These results demonstrate that the relationship between dataset representativeness and downstream classifier fairness is complex; balancing these two quantities requires special care from both model- and dataset-designers.

dataset, proportion, sensitive feature, (15 more...)

arXiv.org Artificial Intelligence

2407.0017

Country:

North America > United States > Texas (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Practical Performance of a Distributed Processing Framework for Machine-Learning-based NIDS

Kajiura, Maho, Nakamura, Junya

arXiv.org Artificial IntelligenceMay-20-2024

Network Intrusion Detection Systems (NIDSs) detect intrusion attacks in network traffic. In particular, machine-learning-based NIDSs have attracted attention because of their high detection rates of unknown attacks. A distributed processing framework for machine-learning-based NIDSs employing a scalable distributed stream processing system has been proposed in the literature. However, its performance, when machine-learning-based classifiers are implemented has not been comprehensively evaluated. In this study, we implement five representative classifiers (Decision Tree, Random Forest, Naive Bayes, SVM, and kNN) based on this framework and evaluate their throughput and latency. By conducting the experimental measurements, we investigate the difference in the processing performance among these classifiers and the bottlenecks in the processing performance of the framework.

classifier, classifier performance, network traffic, (12 more...)

arXiv.org Artificial Intelligence

2405.13066

Country:

Asia > Japan (0.04)
North America > Canada > New Brunswick > Fredericton (0.04)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.04)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

SHROOM-INDElab at SemEval-2024 Task 6: Zero- and Few-Shot LLM-Based Classification for Hallucination Detection

Allen, Bradley P., Polat, Fina, Groth, Paul

arXiv.org Artificial IntelligenceApr-4-2024

We describe the University of Amsterdam Intelligent Data Engineering Lab team's entry for the SemEval-2024 Task 6 competition. The SHROOM-INDElab system builds on previous work on using prompt programming and in-context learning with large language models (LLMs) to build classifiers for hallucination detection, and extends that work through the incorporation of context-specific definition of task, role, and target concept, and automated generation of examples for use in a few-shot prompting approach. The resulting system achieved fourth-best and sixth-best performance in the model-agnostic track and model-aware tracks for Task 6, respectively, and evaluation using the validation sets showed that the system's classification decisions were consistent with those of the crowd-sourced human labellers. We further found that a zero-shot approach provided better accuracy than a few-shot approach using automatically generated examples. Code for the system described in this paper is available on Github.

classifier, hallucination, language model, (14 more...)

arXiv.org Artificial Intelligence

2404.03732

Country: Europe > Netherlands > North Holland > Amsterdam (0.25)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback